AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.62)

arXiv.org Artificial IntelligenceDec-1-2025

Freeze, Diffuse, Decode: Geometry-Aware Adaptation of Pretrained Transformer Embeddings for Antimicrobial Peptide Design

Gawade, Pankhil, Izdebski, Adam, Lizotte, Myriam, Moon, Kevin R., Rhodes, Jake S., Wolf, Guy, Szczurek, Ewa

Pretrained transformers provide rich, general-purpose embeddings, which are transferred to downstream tasks. However, current transfer strategies: fine-tuning and probing, either distort the pretrained geometric structure of the embeddings or lack sufficient expressivity to capture task-relevant signals. These issues become even more pronounced when supervised data are scarce. Here, we introduce Freeze, Diffuse, Decode (FDD), a novel diffusion-based framework that adapts pre-trained embeddings to downstream tasks while preserving their underlying geometric structure. FDD propagates supervised signal along the intrinsic manifold of frozen embeddings, enabling a geometry-aware adaptation of the embedding space. Applied to antimicrobial peptide design, FDD yields low-dimensional, predictive, and interpretable representations that support property prediction, retrieval, and latent-space interpolation.

data mining, machine learning, natural language, (19 more...)

2511.2312

Country:

North America > Canada (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.51)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.68)
(2 more...)

arXiv.org Artificial IntelligenceNov-24-2025

FAR: Function-preserving Attention Replacement for IMC-friendly Inference

Ren, Yuxin, Collins, Maxwell D, Hu, Miao, Yang, Huanrui

While transformers dominate modern vision and language models, their attention mechanism remains poorly suited for in-memory computing (IMC) devices due to intensive activation-to-activation multiplications and non-local memory access, leading to substantial latency and bandwidth overhead on ReRAM-based accelerators. To address this mismatch, we propose FAR, a Function-preserving Attention Replacement framework that substitutes all attention in pretrained DeiTs with sequential modules inherently compatible with IMC dataflows. Specifically, FAR replaces self-attention with a multi-head bidirectional LSTM architecture via block-wise distillation to retain functional equivalence while enabling linear-time computation and localized weight reuse. We further incorporate structured pruning on FAR models, enabling flexible adaptation to resource-constrained IMC arrays while maintaining functional fidelity. Evaluations on the DeiT family demonstrate that FAR maintains comparable accuracy to the original attention-based models on ImageNet and multiple downstream tasks with reduced parameters and latency. Further analysis shows that FAR preserves the semantic token relationships learned by attention while improving computational efficiency, highlighting its potential for energy-efficient transformer inference on IMC-based edge accelerators.

artificial intelligence, machine learning, natural language, (20 more...)

2505.21535

Country: North America > United States > Arizona > Pima County > Tucson (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ma, Tianyi, Wang, Tengyao, Samworth, Richard J.

Provable test-time adaptivity and distributional robustness of in-context learning

arXiv.org Machine LearningOct-28-2025

We study in-context learning problems where a Transformer is pretrained on tasks drawn from a mixture distribution $π=\sum_{α\in\mathcal{A}} λ_α π_α$, called the pretraining prior, in which each mixture component $π_α$ is a distribution on tasks of a specific difficulty level indexed by $α$. Our goal is to understand the performance of the pretrained Transformer when evaluated on a different test distribution $μ$, consisting of tasks of fixed difficulty $β\in\mathcal{A}$, and with potential distribution shift relative to $π_β$, subject to the chi-squared divergence $χ^2(μ,π_β)$ being at most $κ$. In particular, we consider nonparametric regression problems with random smoothness, and multi-index models with random smoothness as well as random effective dimension. We prove that a large Transformer pretrained on sufficient data achieves the optimal rate of convergence corresponding to the difficulty level $β$, uniformly over test distributions $μ$ in the chi-squared divergence ball. Thus, the pretrained Transformer is able to achieve faster rates of convergence on easier tasks and is robust to distribution shift at test time. Finally, we prove that even if an estimator had access to the test distribution $μ$, the convergence rate of its expected risk over $μ$ could not be faster than that of our pretrained Transformers, thereby providing a more appropriate optimality guarantee than minimax lower bounds.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2510.23254

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Neural Information Processing SystemsMay-27-2025, 18:35:22 GMT

Knowledge Circuits in Pretrained Transformers

The remarkable capabilities of modern large language models are rooted in their vast repositories of knowledge encoded within their parameters, enabling them to perceive the world and engage in reasoning. The inner workings of how these models store knowledge have long been a subject of intense interest and investigation among researchers. To date, most studies have concentrated on isolated components within these models, such as the Multilayer Perceptrons and attention head. In this paper, we delve into the computation graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge. The experiments, conducted with GPT2 and TinyLLAMA, has allowed us to observe how certain information heads, relation heads, and Multilayer Perceptrons collaboratively encode knowledge within the model.

machine learning, natural language, pretrained transformer, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Neural Information Processing SystemsMay-27-2025, 10:07:17 GMT

FASTopic: Pretrained Transformer is a Fast, Adaptive, Stable, and Transferable Topic Model

Topic models have been evolving rapidly over the years, from conventional to recent neural models. In this paper, we propose FASTopic, a fast, adaptive, stable, and transferable topic model. FASTopic follows a new paradigm: Dual Semantic-relation Reconstruction (DSR). By reconstructing through these semantic relations, DSR discovers latent topics. This brings about a neat and efficient topic modeling framework.

pretrained transformer, semantic relation, transferable topic model, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.95)

Neural Information Processing SystemsMay-27-2025, 08:23:13 GMT

Pretrained Transformer Efficiently Learns Low-Dimensional Target Functions In-Context

artificial intelligence, learn low-dimensional target function in-context, machine learning, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)

Neural Information Processing SystemsMay-26-2025, 23:19:18 GMT

EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals

Electroencephalography (EEG) is crucial for recording brain activity, with applications in medicine, neuroscience, and brain-computer interfaces (BCI). However, challenges such as low signal-to-noise ratio (SNR), high inter-subject variability, and channel mismatch complicate the extraction of robust, universal EEG representations. We propose EEGPT, a novel 10-million-parameter pretrained transformer model designed for universal EEG feature extraction. In EEGPT, a mask-based dual self-supervised learning method for efficient feature extraction is designed. Compared to other mask-based self-supervised learning methods, EEGPT introduces spatio-temporal representation alignment.

artificial intelligence, eegpt, machine learning, (9 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

arXiv.org Artificial IntelligenceMay-23-2024

Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making

Wang, Hanzhao, Pan, Yu, Sun, Fupeng, Liu, Shang, Talluri, Kalyan, Chen, Guanting, Li, Xiaocheng

In this paper, we consider the supervised pretrained transformer for a class of sequential decision-making problems. The class of considered problems is a subset of the general formulation of reinforcement learning in that there is no transition probability matrix, and the class of problems covers bandits, dynamic pricing, and newsvendor problems as special cases. Such a structure enables the use of optimal actions/decisions in the pretraining phase, and the usage also provides new insights for the training and generalization of the pretrained transformer. We first note that the training of the transformer model can be viewed as a performative prediction problem, and the existing methods and theories largely ignore or cannot resolve the arisen out-of-distribution issue. We propose a natural solution that includes the transformer-generated action sequences in the training procedure, and it enjoys better properties both numerically and theoretically. The availability of the optimal actions in the considered tasks also allows us to analyze the properties of the pretrained transformer as an algorithm and explains why it may lack exploration and how this can be automatically resolved. Numerically, we categorize the advantages of the pretrained transformer over the structured algorithms such as UCB and Thompson sampling into three cases: (i) it better utilizes the prior knowledge in the pretraining data; (ii) it can elegantly handle the misspecification issue suffered by the structured algorithms; (iii) for short time horizon such as $T\le50$, it behaves more greedy and enjoys much better regret than the structured algorithms which are designed for asymptotic optimality.

alg, algorithm, transformer, (17 more...)

2405.14219

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > North Carolina (0.04)
Africa > Togo (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Sriwirote, Panyut, Leong, Wei Qi, Polpanumas, Charin, Thanyawong, Santhawat, Tjhi, William Chandra, Aroonmanakun, Wirote, Rutherford, Attapol T.

Thai Universal Dependency Treebank

arXiv.org Artificial IntelligenceMay-13-2024

Automatic dependency parsing of Thai sentences has been underexplored, as evidenced by the lack of large Thai dependency treebanks with complete dependency structures and the lack of a published systematic evaluation of state-of-the-art models, especially transformer-based parsers. In this work, we address these problems by introducing Thai Universal Dependency Treebank (TUD), a new largest Thai treebank consisting of 3,627 trees annotated in accordance with the Universal Dependencies (UD) framework. We then benchmark dependency parsing models that incorporate pretrained transformers as encoders and train them on Thai-PUD and our TUD. The evaluation results show that most of our models can outperform other models reported in previous papers and provide insight into the optimal choices of components to include in Thai dependency parsers. The new treebank and every model's full prediction generated in our experiment are made available on a GitHub repository for further study.

dependency, parser, treebank, (16 more...)

2405.07586

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Slovenia (0.04)
Europe > Italy (0.04)
(6 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)